Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...

متن کامل

Linear Programming for Finite State Multi-Armed Bandit Problems

1. iBtrodactfMu An important sequential control problem with a tractable solution is the multi-armed bandit problem. It can be stated as follows. There are N independent projects, e.g., statistical populations (see Robbins 19S2), gambling machines (or bandits) etc.. The state of the pth of them at time t is denoted by x,it) and it belongs to a set of possible states S, which in this paper is as...

متن کامل

Process-Oriented Planning and Average-Reward Optimality

We argue that many AI planning problems should be viewed as process-oriented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goal-oriented. The full power of Markov decision models, adopted recently for AI planning, becomes apparent with process-oriented problems. The question of appropriate optimality criteria becomes more critica...

متن کامل

Finite budget analysis of multi-armed bandit problems

In the budgeted multi-armed bandit (MAB) problem, a player receives a random reward and needs to pay a cost after pulling an arm, and he cannot pull any more arm after running out of budget. In this paper, we give an extensive study of the upper confidence bound based algorithms and a greedy algorithm for budgeted MABs. We perform theoretical analysis on the proposed algorithms, and show that t...

متن کامل

Algorithms for multi-armed bandit problems

The stochastic multi-armed bandit problem is an important model for studying the explorationexploitation tradeoff in reinforcement learning. Although many algorithms for the problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Annals of Applied Probability

سال: 1996

ISSN: 1050-5164

DOI: 10.1214/aoap/1034968239